Language models and probability of relevance

نویسندگان

  • Stephen Robertson
  • Djoerd Hiemstra
چکیده

D is a document, {Ti} are query terms. The whole represents the probability that the query could have been generated from the language model representing the document (here simplified to the P (Ti|D) values), but with a smoothing parameter λ, which allows terms some chance of coming from a general language model (the P (Ti) values). The conception is that the user has a document in mind, and that s/he generates the query from this document; the equation then represents the probability that the document that the user had in mind was in fact this one. Hiemstra [1] gives the same equation a slightly different justification. The basic assumption is the same (the user is assumed to have a specific document in mind and to generate the query on the basis of this document), but instead of smoothing, the user is assumed to assign a binary importance value to each term position in the query. An important term-position is filled with a term from the document; a non-important one is filled with a general language term. If we define λi = P (term position i is important), then we get

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Improving the Robustness of Relevance-Based Language Models

We propose a new robust relevance model that can be applied to both pseudo feedback and true relevance feedback in the language-modeling framework for document retrieval. There are three main differences between our new relevance model and the Lavrenko-Croft relevance model. First, a query is treated as a short, special document and included in approximating a relevance model, in addition to a ...

متن کامل

Opinion Retrieval Experiments Using Generative Models: Experiments for the TREC 2006 Blog Track

Ranking blog posts that express opinions regarding a given topic should serve a critical function in helping users. We explored three types of opinion retrieval methods in the framework of probabilistic language models. The first method combines topic-relevance model and opinion-relevance model that captures topic dependence of the opinion expressions. The second method makes use of probability...

متن کامل

Experiments in Applying Information Flow Analysis in Query Expansion

In recent literature there have been a number of promising approaches proposed for estimating query language models Lavrenko & Croft estimate the query language model in terms of a Relevance Model [6]. The query Q is considered to be a random sample from the unknown relevance model R. R can be envisaged as a unknown process from which words can be sampled, so if query terms m q q , , 1 K have b...

متن کامل

Robust Relevance-Based Language Models

We propose a new robust relevance model that can be applied to both pseudo feedback and true relevance feedback in the language-modeling framework for document retrieval. There are three main differences between our new relevance model and the Lavrenko-Croft relevance model. The proposed model brings back the original query into the relevance model by treating it as a short, special document, i...

متن کامل

The Teaching Methods in Translation Courses: Quality, Relevance and Resources

The study was intended to provide a description of the attitudes of English-major studentstowards the teaching methods in translation courses to find out more about the relevance andquality of methods to the students’ needs, concerning the necessary educational resourcesprovided in the methods of teaching. Accordingly, a multi-item Likert-scale questionnairecontaining 32 items was developed bas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001